Enhancing structure relaxations for first-principles codes: 
an approximate Hessian approach 
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We present a method for improving the speed of geometry relaxation by using a harmonic ap- 
proximation for the interaction potential between nearest neighbor atoms to construct an initial 
Hessian estimate. The model is quite robust, and yields approximately a 30% or better reduction in 
the number of calculations compared to an optimized diagonal initialization. Convergence with this 
initializer approaches the speed of a converged BFGS Hessian, therefore it is close to the best that 
can be achieved. Hessian preconditioning is discussed, and it is found that a compromise between an 
average condition number and a narrow distribution in eigenvalues produces the best optimization. 
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I. INTRODUCTION 

In many cases the slowest step in a density functional 
calculation (DFT) or other ab initio calculations is find- 
ing the optimal atomic positions which minimize the to- 
tal energy. With older minimization approaches, such as 
the conjugate gradient method, the number of evalua- 
tions scales proportionally with the systeFcommm size. 
More powerful are quasi-Newton methods, in particular 
the Broydcn-Flctchcr-Goldfarb-Shanno (BFGS) method, 
which can show quadratic convergence provided that 
breakdowns of the curvature condition (discussed later) 
are protected against. Essential to the quasi-Newton 
methods are estimates for the gradient and curvature of 
the potential energy surface; the latter being stored in a 
matrix commonly referred to as the Hessian, which con- 
tains all second derivatives (or atomic force constants). 
The classic BFGS method uses a simple diagonal matrix 
as the initial Hessian estimate, perhaps with the initial 
diagonal term using the Shanno-Phua scaling^; see also 
the discussion by Nocedal and Wright^i. In principle one 
could achieve far better convergence by some appropri- 
ate choice of the initial Hessian estimate, as suggested by 
some recent analysis^i^i^. 

In this paper, we detail an approach for improving on 
the estimate of the starting Hessian, using a harmonic po- 
tential describing the interactions between nearest neigh- 
bor atoms. We find that it is important to combine this 
with a diagonal component plus an appropriate scaling 
term. Slightly unexpectedly, what turns out to be im- 
portant is a balance between making the initial Hessian 
estimate replicate that of the true problem and keeping 
the condition number of the estimate small. 

The structure of this note is as follows. First, we 
briefly review conventional optimization methods (Sec- 
tion [Hi, with some comments about how they might be 
improved for density functional theory (DFT) calcula- 
tions. Second, we outline the algorithm for generating 
the Hessian estimate and implementing it into the all- 
electron (linearized) augmented-plane wave 4- local or- 



bitals (L/APW+lo) package WIEN2ki (SectionHIJ). The 
robustness of the program is tested by performing ge- 
ometry relaxations for various classes of materials (Sec- 
tion [IV|. Finally, we conclude with a discussion on the 
importance of Hessian preconditioning, and we propose 
a general scheme for resolving these problems. 



II. OPTIMIZATION METHODS 

At the heart of quasi-Newton methods is an expansion 
of the energy in the form 



E'i = E- 



(1) 



where E^ is the predicted energy, E and g are the energy 
and gradient for a step s from the current state, and H is 
the Hessian matrix. The optimum step can be obtained 
directly in principle as 



H 



(2) 



assuming that the Hessian is known. The concept of a 
quasi-Newton method is to calculate an approximation 
to the Hessian (or its inverse depending upon the exact 
method used) from previous gradient information. The 
most successful approaches use what are called secant 
methods^, in particular the Broyden-Flecher-Goldfarb- 
Shanno (BFGS) inetho(&d0JJJ^, The most important 
contribution from these minimization algorithms is the 
use of Hessian updating techniques, which allow for the 
collection of more information about the energy surface. 
In general, after each cycle the Hessian is updated dur- 
ing the minimum search until the convergence criterion 
is satisfied. It is important to recognize that conver- 
gence can be achieved without ever reaching the true 
Hessian, which suggests that the efficiency of the struc- 
ture relaxation depends on both the starting geometry 
and the initial conditioning of the Hessian estimate (dis- 
cussed later). In fact, the true Hessian of the problem 
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is not always the optimal one, and a compromise be- 
tween conditioning and accuracy is much more desirable 
for optimization problems; as Baali has shown much of 
the success of quasi- Newton methods relies on self-scaling 
algorithm s ^•^i^'^ . The first estimate for the Hessian is usu- 
ally a unitary matrix, although this is not required if 
physical knowledge of the system is available. For in- 
stance, in an earlier version of the WIEN2k codei an 
estimate of the bonding force constants and atom multi- 
plicities was used for the initial diagonal elements — this 
worked much better than a simple constant. As we will 
see, one can do better than this. 

The mathematics behind the secant method is that a 
typical iteration for the minimization of /(x) is given by 
the form 

Xk+i =xu+ a^dk (3) 

where dk = — B^^V/(xfc) and B/j is the approximation 
for the true Hessian that is updated and the step size ak 
is chosen by a line search or a trust-region method (as 
here)i^^'ii. 

For any two consecutive iterations, Xk and x^+i, 
with their gradients, V/ (xk) and V/ {xk+i) information 
about the curvature of the surface (the Hessian) is known 
since 

[V/ (xk+i) - V/ {xk)] « Bfc+i [xk+i - Xk] (4) 

writing Sfe = Xk+i - Xk and = V/ {xk+i) - V/ (x^), 
this can be rewritten as 



Qfe = Bfe+iSfc. 



(5) 



The expression given in Eq. [5] is known as the secant 
equation. An important constraint is that Bfe^_i needs 
to be positive definite for the step to be downhill. Mul- 
tiplying Eq. [5] on the left by Sfe yields what is called the 
curvature condition s/j ■ > 0. This is equivalent to the 
geometric interpretation that over the step length the ob- 
ject function has positive curvature (i.e., the step is taken 
in a lower energy direction). When this condition is sat- 
isfied, Eq. [5] will always have a solution and the BEGS 
update 



B 



fe+i 



Bfe + ABfc, ABi 



BfcSfcS^Bfc 
SfcBfcSfc 



(6) 



will maintain a positive definite approximation to the 
Hessian. 

It is worth mentioning that the curvature condition 
does not always hold, so it must be explicitly enforced 
otherwise the BEGS method can fail completely; this 
is one of the weaknesses of these updating methods. 
This often occurs when the character of the Hessian 
changes substantially during the course of the minimiza- 
tion, which is more likely to occur if one starts far from 
the minimum. Fortunately, the BEGS update is rather 
well behaved, in that the Hessian estimate will tend 
to correct itself in a few steps, as compared to other 
approaches^. Three conventional techniques exist for 
handling the case when the curvature condition fails: 



1 . The calculations are restarted from the current po- 
sition with a diagonal initial estimate. 

2. A skipping strategy is employed on the BEGS up- 
date (Bfc+i = Bfc). 

3. The use of a revised (damped) BEGS update^ 
which modifies the definition of q^. 

For the first case, any important curvature information 
is lost and previous steps are essentially wasted. The 
second technique allows one to incorporate the curvature 
information at previous iterations. However, it requires 
careful control, and too many updates may be skipped 
resulting in further loss of curvature information. (The 
limited memory method^ii^ can do this better because it 
skips steps far from the current location.) The particular 
code we employed used the third method where the scalar 
tfe is defined by = (0.2) B^Sfe > and = Okqk + 
(l-0fc)BfeSfc for 



Sjt BfcSfc-s^ qfc ' 



if 

if 



Sfe qfc > tfc 
sjqfe < tfc 



The BEGS update is then given just as in Eq. [6l with 
qfc replaced by Ufc. This formulation enforces the curva- 
ture condition, and allows for an interpolation between 
the unmodified Hessian update {9k = 1) and the Hes- 
sian at the current iterate. As a consequence, every step 
contributes to defining the curvature, and no steps are 
wasted 

Much of the previous discussion has been concerned 
with the BEGS update, but selection of the step size ak 
(and direction) merits attention. In the code we have 
(dmng.f from NETLIB with some minor changes) the 
entire BEGS update is wrapped within a trust-region al- 
gorithm which is used to calculate the best step to take 
based on a quadratic model of the objection function (the 
PES). While line search methods can be used, for a DET 
problem where the gradient comes essentially for free, the 
most efficient approach is a trust-region algorithmic. In 
this method information about the object function / is 
collected to construct a quadratic model function C that 
is said to adequately sample / in the neighborhood of the 
current iterate. The model function 



k+l) 



fisk) 



gfc Sfe+1 



1 



Sfc+iBfcSfc+1 



uses the current estimate of the Hessian and imposes an 
additional constraint on the step length. ||sfc_|.i|| ^ R 
where R is the trust region radius. The step size which 
minimizes / is then chosen such that it sufficiently min- 
imizes C over the trust region; the radius R is then ad- 
justed iteration to iteration according to how well the 
step reduced the function with respect to the predicted 
reduction value determined from the step size (a so called 
effectiveness measure)^. Therefore, if a poor step is 
taken, the radius is decreased, until the current Hes- 
sian approximation is good enough, and then it is sub- 
sequently expanded. Compared to line search methods. 
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this approach may not give the best improvement per 
direction, but often will be faster in terms of the nett 
improvement per function evaluation. 

The routine in our minimization has the added fea- 
ture of using an adaptive trust region method, in that it 
switches between different models in order to determine 
the optimal step size^i^i. The algorithm first calculates 
what it believes is an appropriate step size (such that 
the length of the step is less than the radius of the trust 
region). It is unusual to calculate the exact trust region 
step, so approximate trial steps are found which approx- 
imate the solution in this region. For each iteration the 
algorithm computes the step size as the linear combina- 
tion of the steepest descent direction and a quasi-Newton 
search direction. Different step types are then chosen 
(Fig. [T]), the main ones being: 

1. A restricted Cauchy step (s/^c = ^^HgTIT-' 
trust radius R is smaller than or equal to the 

T 

Cauchy step (sc = ~ jr^^gfc)- The Cauchy step 

is taken along the gradient of the quadratic model 
of the object function and is of a length defined by 
the minimizer of that function. 

2. A full Newton step {sn = — B^^g^) is taken in 
length and direction if the trust region allows it 
(large enough), otherwise a step in the Newton di- 
rection of a limited length is taken, such that it 
satisfies the constraint of R. 

3. A double dogleg step {sdd = sc + -fkisN - sc)) if 
the trust radius is between the Cauchy and New- 
ton steps. The jk parameter (see Fig. [T|) describes 
the length and direction between the Cauchy point 
and the intersection of the trust region. The double 
dogleg may therefore be seen as a compromise be- 
tween the Cauchy step and Newton step, whereby 
the direction and length is given by the line con- 
necting these steps through the intersection of R. 

This approach is used since it provides a stronger bias 
toward the Newton step direction, as an attempt to accel- 
erate the optimization. The purpose of the Cauchy step 
is just to minimize the local model over a space that is 
known to be well defined in the steepest descent direction. 
Since steepest descent directions do not always provide 
the best minimization, alternative candidate steps (and 
directions) are evaluated. Conventional BFGS methods 
as outlined above work very well, and the code we have 
used (with some minor additions) is one of the most re- 
spected and robust versions freely available. For some 
special problems other codes might work better, but this 
is probably close to an universal algorithm. It uses re- 
verse communication, which makes it rather easy to im- 
plement into programs. However, there is still room to 
possibly improve the update method for DFT problems 
because as we have mentioned, the gradient comes almost 
for free: 




FIG. 1: (color online) Schematic illustration of the principle 
steps used in the BFGS algorithm. The curved trajectory 
(dashed-dotted line) corresponds to the optimal path to the 
minimizer of the object function. The step is selected based 
on the trust region (dotted circle) of radius R and how well 
it minimizes the objection function (elliptical contours). The 
Cauchy step (C) is perpendicular to the contour lines of the 
object function at the current position, and the full Newton 
step (N) is shown along the gradient of the object function. 
The double dogleg step (DD) is shown to be biased towards 
the Newton direction according to the implementation of the 
Dennis and Mei algorithm (cf. Ref.— ). 

• Conventional trust region methods discard a bad 
step; it might be better to incorporate this infor- 
mation into the BFGS update then recalculate a 
revised step. 

• Most BFGS codes attempt to keep the local mem- 
ory requirements and CPU time low. However, 
for a DFT problem these are generally negligi- 
ble compared to what the main iterations require. 
Hence one might improve the codes by doing more 
detailed and accurate analysis of plausible steps; 
for instance, going beyond a simple double dogleg 
method. 

• As mentioned above the weakness of the BFGS 
method when the curvature condition fails might be 
something where further research would be useful. 
For instance, one could easily keep a step history 
and switch to a limited-memory method or an up- 
date based upon less than the full number of steps. 
One idea would be to search over all the history of 
previous steps to find an "optimum" Hessian esti- 
mate that will do better than the three conventional 
methods described above. 



III. IMPLEMENTATION 

The general approach for constructing the initial Hes- 
sian approximation is now outlined. First, the symmetry 
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independent atom set is expanded by the appropriate 
operators to construct the full set of atoms in the struc- 
ture. A search for the free variable parameters to be 
optimized is subsequently performed, identifying special 
sites whose positions are not allowed to vary for the par- 
ticular symmetry of the structure. A nearest neighbor 
search algorithm is then carried out over the expanded 
atom net, to a user specified cutoff distance, in order to 
determine the length over which the interatomic poten- 
tial acts. The cutoff terms and bonding strength used 
are discussed later. The elements in a trial Hessian are 
then generated by numerical differentiation of a simple 
pairwise energy with a step size of fO^^ A, which tests 
show to be adequate. The final step is to construct an 
initial estimate to be used in the form 

Binitial = 7Btrial + ??I 

where 7 and 77 are constants, and I is the identity matrix. 

We have experimented with two models for the pair- 
wise energy, a spring model and a simple harmonic ap- 
proximation. The harmonic model consistently outper- 
formed the spring model, for reasons which wc believe 
are associated with the conditioning which we will dis- 
cuss later. The harmonic model can be written as 

AE « ir,,, Ar^ 

where Ar^j is the change in the distance between the 
two atoms i and j, and Ti_j is an appropriate spring 
constant linking them. Here an exponential term is 
used to model the pairwise bond strength, = 
exp [—f (vij — dm) /dm], where 1/ is a user-specified expo- 
nential decay term (discussed later) and dm is the short- 
est nearest neighbor distance. For practical purposes the 
absolute value of the spring constants are not important, 
but only the relative ratios of them. 

After building the full Hessian for the structure, it is 
symmetry reduced to contain only the symmetry inde- 
pendent atoms and transformed to conventional crystal- 
lographic fractional units. Finally, a Cholesky factoriza- 
tion using a LINPACK routine (dchdc) is performed and 
the Hessian approximation is introduced in the first step 
of a slightly modified version of the dmng.f minimization 
routine from NETLIB. This minimizer was incorporated 
into the geometry optimization routine found in WlEN2k 
by one of us (LDM) some time ago, and is now widely 
used. 

All that is required by the user is a file from which 
the crystal structure is read, and a parameter file which 
contains constants used in the model during the Hessian 
construction. The parameter file was found to be quite 
useful, as it allows for the user to tailor the Hessian for 
different types of systems (e.g. soft or hard). The param- 
eters that were found most useful and which have been 
included in the model can be found in Table HI Values 
that have been shown to be quite reasonable for most 
calculations are also listed. 

A description of each parameter follows: Rm defines 
the maximum distance (in atomic units) to which the 



TABLE I: Free model parameters in the Hessian algorithm 
that may be used to customized the estimate. 



Parameter 


Values 


Description 


X 


0.05 


Cutoff term limiting the number of atom pairs 


i> 


1.50-2.50 


Strength of exponential decay bonding term 


Rm 


8-12 


Maximum nearest neighbor distance 


7 


0.20-0.40 


Multiplicative rescaling term 


V 


1.00-4.00 


Additive diagonal rescaling term 



nearest neighbor search algorithm includes atoms for 
building the energy terms; ly is used in the exponen- 
tial decay function (F), which describes the strength of 
the pairwise interaction between atoms. An additional 
cutoff term (x) is used to restrict which atom pairs arc 
included in the gradient calculation. Once the exponen- 
tial weighting factor becomes smaller than this x-value, 
those bonds (or atoms pairs) are no longer considered in 
the force calculation. The affect of varying these param- 
eters is discussed later. Finally, the two scaling terms 7 
and 77. 



IV. RESULTS 

The initializer described above has been used for sev- 
eral months for a range of problems, and appears to be- 
have well for cases where the initial point is both close 
to or far from the minimum. To investigate the effec- 
tiveness of the initializer, we used the all-electron DFT 
code WIEN2k and implemented it into the structure op- 
timization routino^. In WIEN2k space within the unit 
cell is partitioned into spheres that define the atomic re- 
gions and an interstitial region linking them. The basis 
functions are then composed of atom-like wavefunctions 
inside the spheres centered about the ion cores with a 
radius (Rmt) determined by the muffin-tin approxima- 
tion, and outside these atomic regions by a planewave ex- 
pansion. The partial solutions to the Kohn-Sham equa- 
tions are then matched at the interface. For more de- 
tails on WIEN2k see Ref<i, additionally a review of the 
LAPW method has been given by Singh^^. To evaluate 
the performance in detail, we have relaxed in a more sys- 
tematic fashion a series of different structures with vary- 
ing degrees of freedom (relaxation in one or more Carte- 
sian coordinate directions). A summary of these struc- 
tures is given in Table [III For these examples the PBE- 
GGA exchange-correlation functional^^ was used with a 
planewave cutoff of RKniax=7.00. Values for the muffin- 
tin radii used in the calculations can be found in Table 
IIVI for each system. Multiple convergence criteria were 
also required and they are as follows: (f ) the force vector 
on each atom was less than I mRy/a.u.; (2) the energy 
tolerance was 0.1 mRy; and (3) the charge convergence 
within the muffin-tins was 5.0 x 10~^e. All of the op- 
timized energies (Eq) for the structures are available in 
Table Hill 

For each structure, both the number of geometry steps 
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TABLE II: Material systems investigated with relevant crys- 
tallographic information. 



System 


Lattice 


Space Group 
(symmetry) 


Atoms 


D.O.F. 


Si02 


primitive tetragonal 


136 {P42/^nm) 


6 


1 


LaCuOS 


primitive tetragonal 


129 (P4/nmm) 


8 


2 


MgVOa 


centered orthorhombie 


65 (Cmmm) 


10 


3 


SiOa 


rhombohedral 


154 (P3-221) 


9 


4 


BiiTiaOia 


body centered tetragonal 139 (I^/mmm)'^ 


38 


28 



"Symmetry about odd digits (B2cb), where the digits are the mim- 
ber of TiOe octahedra in the perovskite-like fragments of the struc- 
ture. 



required by the minimization routine and the number of 
self-consistent field (SCF) calculations to satisfy the con- 
vergence criterion were reduced w\t\\ our initialized Hes- 
sian. Table [mi lists the number of SCF cycles required to 
achieve an equilibrium structure using both a simple di- 
agonal initializer and our Hessian formulation. Through- 
out the analysis we examine only the deviation in number 
of SCF cycles for each estimate, and note that these val- 
ues are similar to the number of geometry minimization 
steps required. For each case examined, no more than 
three steps were rejected by the Trust-Region algorithm. 
(It is worth mentioning that the diagonal initializer was 
previously optimized to be close to the best, general form 
available, a factor of at least 30% or more better than a 
simple unitary scaling.) In general the number of SCF 
cycles to converge was reduced by 30% compared to the 
reference diagonal Hessian. 



A. Hessian estimate efficiency 

In order to evaluate the accuracy of the initial Hessian 
and its effects on convergence, structure relaxations were 
also performed using a converged BFGS Hessian from 
a previous calculation. We cannot prove that this is the 
best possible Hessian, but from previous experience it ap- 
pears to be very close to optimum. Table Hill shows the 
results for the number of SCF cycles required to reach 
convergence with each of the different Hessians. It is im- 
portant to recognize that the number of SCF cycles re- 
quired with our initial estimates approached that of the 
converged Hessian. As the system grows in complexity 
(size and number) there remains room for improvement 
in the reduction of SCF cycles that can be obtained. The 
deviation between the number of SCF cycles for the con- 
verged Hessian and the initialized Hessian (in for example 
Bi4Ti30i2) can be overcome by using a better potential 
to describe the pairwisc bonding. In these larger struc- 
tures, the Hessian is sensitive to small deviations in the 
off-diagonal elements, for this reason a longer-range in- 
teraction potential may show even better convergence. 
Nonetheless, the experiments suggest we are approaching 
the limit at which an equilibrium structure can be found 
using current optimization methods. As expected, the 
better the initial Hessian estimate of the true curvature 



of the PES, the faster the optimization. Consequently, we 
can conclude that our pairwise potential acting over sev- 
eral nearest neighbors adequately provides an estimate 
of the curvature of the PES. We also offer a more rig- 
orous comparison in the next section by examining the 
eigenvalues of each Hessian matrix. 

We also studied the effect of the Hessian estimate on 
the initial geometry step size used in the BFGS update. 
The length of the first geometry step for each structure 
is given in Table Hill It is clear that there is a decrease in 
the number of SCF cycles required with increasing max- 
imum geometry step size. The increased reduction in ge- 
ometry optimization steps (data not shown) is attributed 
to a more accurate Hessian (e.g. closely approximating 
the eigenvalues) . For a smaller geometry step length the 
time to convergence increases, and the user can be fairly 
well guaranteed that the minimization will proceed sta- 
bly. A more aggressive approach is to increase the geom- 
etry step size permitted in the minimization algorithm, 
which can be done with confidence if the initial Hessian 
resembles the curvature of the PES. The geometry step 
sizes given in Table IIIII also suggest that our estimate is 
better than the standard initialization, since the initial 
step is much larger. It might appear then, that by tak- 
ing a larger geometry step size, it is possible to reduce 
the number of iterations; however, this may result in the 
BFGS update moving in directions of higher energy at 
first, before final convergence is achieved. Increasing the 
step size too aggressively may therefore result in more 
steps than desired. 



B. Hessian conditioning 

It is known that the rate of minimization for steepest 
decent and conjugate methods is related to the condition 
number of the Hessian matrix. The condition number 
is defined as the ratio of t^'max/'j-'min, where Wmax and 
Winin ai'e the largest and smallest eigenvalues, respec- 
tively. Typically, the minimizations steps required scales 
with the condition number—. Essentially, the condition 
number of a matrix measures how sensitive its inverse is 
in changes to the original matrix: for a large condition 
number the inverse of the matrix is very sensitive or un- 
stable; a matrix with a low condition number (bounded 
by unity) is said to be well-conditioned, while a matrix 
with a high condition number is said to be ill-conditioned. 
From our experience, it turned out to be important to 
consider the conditioning of the initial Hessian. 

The results in Table IIIII are from appropriately con- 
ditioned Binitiai matrices, whose properties (condition 
numbers and eigenvalues) are given in Table fVl The cal- 
culation of condition numbers, eigenvalues and their cor- 
responding eigenvectors were performed with standard 
LAPACK routines for real symmetric matrices^!. 

To explore the scaling effect, we present more detailed 
results for the rhombohedral Si02 and Bi4Ti30i2 struc- 
tures. Similar analyses were done on the other structures 
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TABLE III: Summary of optimization results. The number of cycles are given for the default minimization method (nup), the 
Hessian initializer (jiinit) and a converged BFGS Hessian (ncv)- Similarly, the length of the first geometry step sizes from the 
minimization algorithm are given for each approach. A * indicates a Cauchy step, and a # dogleg step, while those without 
any denotation are of the standard Newton type. The absolute values for the starting geometry energy (Es) and the total 
converged energy (-Bo) are provided. 







Iterations 






Step Size 








System 




JlINIT 


ncv 


Adf 


AiNIT 


Acv 


E. (Ry) 


Eo (Ry) 


SiOz 


3 


2 


1 


0.098 


0.035 


0.170 


62.719 


62.757 


LaCuOS 


5 


5 


2 


0.105 


0.233 


0.706 


9.415 


9.480 


MgVOs 


10 


5 


5 


0.153 


0.555* 


0.894* 


2.147 


2.493 


SiOa 


10 


6 


6 


0.152 


0.268 


0.649* 


27.047 


27.100 


Bi4Ti30l2 


31 


21 


14 


0.024 


0.055 


0.184 


58.397 


58.542 



TABLE IV: Muffin-tin radii (-Rmt) used for the total energy 
calculations in the all-electron plane wave code WIEN2k. 



System -Rmt (bohr) 

'SiCh Si=1.70 and 0=1.30 

LaCuOS La=2.40, Cu=2.20, 0=1.70 and S=2.00 

MgVOa Mg=1.60, V=1.60 and 0=1.40 

Si02 Si=1.70 and 0=1.50 

Bi4Ti30i2 Bi=2.28, Ti=1.74 and 0=1.540 



eigenvalues correctly replicate the true PES) at a point in 
space far from the optimal geometry, it is better to have 
the eigenvalues gradually converge toward the true val- 
ues as the system moves from the initial geometry to the 
equilibrium configuration as suggested by Olsen et al^. 
In fact, the BFGS method can achieve convergence with- 
out actually replicating the true Hessian of PES. 



TABLE V: Optimal scaling values for each structure and 
the largest and smallest eigenvalues along with the condition 
number for each model Hessian matrix. The remaining pa- 
rameters were fixed at Rm ~ 10.0, v — 2.00, and x ~ 0.05 for 
structure. 



System 


7 


■n 






SiOa 


2.75 


1.05 


1.0500, 12.050 


11.476 


LaCuOS 


0.15 


1.05 


1.3179, 1.3821 


1.0487 


MgV03 


0.50 


1.05 


1.7298, 2.5868 


1.4954 


Si02 


0.25 


1.05 


1.2998, 1.8722 


1.4404 


BiiTiaOia 


0.12 


1.05 


1.0500, 1.4129 


1.3457 



examined, and the results presented are representative 
of the general trends. Figs. [2] and [3] show the conver- 
gence in energy for various scaling parameters. While 
the condition number of the Hessian is a good estimate 
at how successful the optimization will be, we have found 
that a better metric is to examine the eigenvalues of 
the Hessian matrix. The eigenvalue distributions are 
shown in the left panels of Figs. [2] and [3] for Si02 and 
Bi4Ti30i2. We find that an average condition number 
is best (see for example^^), and a tight cluster of the 
eigenvalues (small standard deviation) is desired. From 
these figures, it is clear that with a wider distribution of 
the Hessian eigenvalues, the structure relaxation perfor- 
mance declines. Additionally, the minimization occurs 
more stably when the eigenvalue distribution is narrow. 

Our study suggests that we want to minimize the ra- 
tio between the largest and smallest eigenvalues (as ex- 
pected) to optimize the condition number. However, we 
do not want the matrix overly conditioned. In fact, the 
most robust geometry relaxation occurs when the eigen- 
values deviate slightly from those of the true curvature. 
Rather than achieving a fully converged Hessian (i.e. the 



V. DISCUSSION 

Throughout this investigation we have attempted to 
reach the optimal rate of geometry relaxation for a given 
structure. We have reached several conclusions on the op- 
timal values of the scaling parameters, and an approach 
to finding them. We note that in optimization problems, 
scaling tends to be the responsibility of the investigator 
owing to the variety of applications; however, at times it 
is not always clear what is the best or most robust means 
of doing so. 

This study has allowed us to build a set of parameters 
that can aid in the relaxation of large condensed struc- 
tures. These parameters listed in Table |T] are in order of 
increasing importance and their effects on convergence 
are now discussed. The most straightforward parameters 
to set are x-i ^ -^m- From our experience, variation in 
these parameters only affected optimization performance 
by on average a few geometry steps (and no change in 
the number of SCF cycles) over the listed range. The 
most significant effect of these parameters was found to 
be on the initial step size in the BFGS update. Most 
notably, increasing v resulted in larger steps sizes by a 
few percent. 

In order to further optimize the efficiency of the min- 
imization algorithm, scaling of the Hessian through the 
7 and 77 parameters was investigated. The best Bjnitiai 
matrix seems to be a balance between appropriately scal- 
ing the diagonal elements with respect to the off-diagonal 
elements and eigenvalue clustering. As we have shown, 
intelligent choices for the scale parameters can enhance 
performance. We note that in our model as the ratio 
of 7/77 increases the condition number of the Hessian in- 
creases exponentially. 
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(a) 



(b) 



(c) 



(d) 



(e) 



H 1 H 



H 1 h 



Eigenvalue(crJ) 



. (a) y= 0.25,11 = 1. 05 
» (b) Y=0.50,r|= 1.05 

■ (c) Y= 2.75, r| = 1.05 
. (d) Converged Hessian from (e) 

■ (e) Default optimization 




Eigenvalue (a)) 



SCF cycle 



FIG. 2: (color online) Left panels: Eigenvalue distribution of the Hessian matrix for the rhombohedral Si02 structure with 
different Hessian scaling values (a) 7 = 0.25, rj = 1.05; (b) 7 = 0.50, 77 = 1.05; (c) 7 = 2.75, 77 = 1.05; (d) Converged BFGS 
Hessian; and (e) the default optimization. The standard deviation (a) for the eigenvalues is given for each case. Right panel: 
Convergence of the total energy as a function of the SCF cycle number for each calculation. Eq is the converged value of the 
total energy for each run. 



(b) 



envalue((0) 



(c) 




3 4 
Eigenvalue (oi) 




SCF cycle 



FIG. 3: (color online) Left panels: Eigenvalue distribution of the Hessian matrix for the Bi4Ti30i2 structure with different 
Hessian scaling values (a) 7 = 0.25, r; — 1.05; (b) 7 = 0.50, rj = 1.05; and (c) the default optimization. The standard deviation 
(a) for the eigenvalues is given for each case. Right panel: Convergence of the total energy as a function of the SCF cycle 
number for each calculation. Eo is the converged value of the total energy for each run. 



We have also found that our conditioning method has 
led to an increase in the number of Newton-type steps 
in the minimization algorithm. This fact suggests that 
the algorithm may be behaving more like metric based 
minimization techniques, i.e. Newton methods where ad- 
jacent steps are forced to be conjugant to each other 
(sfe_|_iBfcSfc = 0). In addition, the convergence depen- 
dency on the eigenvalue structure has been know for 



sometime for conjugate gradient methods2ii2^, but this 
behavior for BFGS methods is less well-documented. In 
fact, the eigenvalue distribution has been reported to 
have minimal influence on convergence rates through 
tests performed on large molecules^S. However, as the 
object function more accurately describes the PES, i.e. as 
harmonic about the minimum, the BFGS update behaves 
like a variable metric method in the sense that steps of 
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optimal length are chosen. Furthermore, the eigenvalue 
cluster phenomenon is consistent with quasi-Newton type 
updating as Morales and others have shown^. The eigen- 
value spectrum of a matrix can then be preconditioned to 
form narrow clusters in order to accelerate convergence in 
optimization problems. The preconditioning effects are 
a general property for solutions to linear systems and for 
that reason are scalable. Our results may then be applied 
to preconditioning a large set of structures requiring op- 
timization. 

While generalized minimization algorithms arc neces- 
sary foundations for structure calculations, tailoring of 
the geometry relaxation routines provides a robust means 
for enhancing performance. Of course, the optimal ap- 
proach will be different for different structures. For prac- 
tical purposes, it is important that these geometry relax- 
ations be run with very little parameter adjustments by 
the user. Therefore from the previous considerations, 
only variations in the parameters which affect the condi- 
tioning of the matrix and the clustering of the eigenvalues 
should be considered, i.e. reduce the standard deviation 
in the eigenvalues of Bjnitiai- The effect of this clustering 
seems to be a consequence of the optimization method 
(the BFGS update) and is still being investigated. 



VI. CONCLUSION 

We have developed a customizable model (any interac- 
tion potential can be substituted) which adequately ap- 



proximates the curvature of the potential energy surface 
of a crystal structure. The model has been parameter- 
ized to allow for modification for different system types. 
We have shown that our method results an approximate 
30% decrease in the number of SCF cycles required to 
achieve an equilibrium structure relative to the standard 
routines. In fact, our estimate is shown to closely repli- 
cate the behavior of a converged BFGS Hessian. The ef- 
fects of preconditioning have also been investigated, and 
a general approach for enhancing the rate of convergence 
through scaling factors has been suggested. 
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